Annotators' Agreement: The Case of Topic-Focus Articulation
نویسندگان
چکیده
The annotation of the Prague Dependency Treebank (PDT) is conceived of as a multilayered scenario that comprises also dependency representations (tectogrammatical tree structures, TGTS’s) of the underlying structure of the sentences. TGTS’s capture three basic aspects of the underlying structure of sentences: (a) the dependency tree structure, (b) the kinds of dependency syntactic relations, and (c) the basic characteristics of the topic-focus articulation (TFA). Since the PDT is a large collection and the annotations on the deepest layer are to a large extent performed by several human annotators (based on an automatic preprocessing module), it is more than necessary to observe the consistence of annotators and the agreement among them. In the present paper, we summarize the results of the evaluation of parallel annotations of several samples taken from PDT and the measures accepted to improve the consistency of annotations.
منابع مشابه
Corpus Annotation on the Tectogrammatical Layer: Summarizing of the First Stages of Evaluations
We summarize here the results of a series of evaluations of the annotators’ assignments of tectogrammatical (i.e. underlying syntactic) tree structures and of the values of the edges as well as the values of the attribute representing the topic-focus articulation of the sentences, within the large-scale project of the Prague Dependency Treebank.
متن کاملWhat can linguists learn from some simple statistics on annotated treebanks
The goal of the present contribution is rather modest: to collect simple statistics carried out on different layers of the annotation scenario of the Prague Dependency Treebank (PDT; [1]) in order to illustrate their usefulness for linguistic research, either by supporting existing hypotheses or suggesting new research questions or new explanations of the existing ones. For this purpose, we hav...
متن کاملThe Prague Dependency Treebank: Crossing the Sentence Boundary
The units processed by tagging procedures both automatic and manual are sentences (as occurring in the texts in the corpus), but the human annotators are instructed to assign (disambiguated) structures according to the meaning of the sentence in its environment, taking contextual (and factual) information into account. We focus in the paper on two issues: how to capture (i) the topic-focus arti...
متن کاملLet's Agree to Disagree: Measuring Agreement between Annotators for Opinion Mining Task
There is a need to know up to what degree humans can agree when classifying a sentence as carrying some sentiment orientation. However, a little research has been done on assessing the agreement between annotators for the different opinion mining tasks. In this work we present an assessment of agreement between two human annotators. The task was to manually classify newspaper sentences into one...
متن کاملComparing transcription agreement on non-native English speech corpus between native and non-native annotators
This paper aims to compare transcription agreement on nonnative English speech corpus spoken by Korean learners between native and non-native annotators. Ten non-native annotators and three native annotators participate in the transcription of 608 sentences. All annotators are provided with forced-aligned phone sequences, which are to be corrected in case when they are realized differently. The...
متن کامل